Search Results: "romain"

1 April 2015

Raphaël Hertzog: My Free Software Activities in March 2015

My monthly report covers a large part of what I have been doing in the free software world. I write it for my donators (thanks to them!) but also for the wider Debian community because it can give ideas to newcomers and it s one of the best ways to find volunteers to work with me on projects that matter to me. Debian LTS This month I have been paid to work 15.25 hours on Debian LTS. In that time I did the following: That s it for the paid work. But still about LTS, I proposed two events for Debconf 15: A Debian LTS logoIn my last Freexian LTS report, I mentioned briefly that it would be nice to have a logo for the LTS project. Shortly after I got a first logo prepared by Damien Escoffier and a few more followed: they are available on a wiki page (and the logo you see above is from him!). Following a suggestion of Paul Wise, I registered the logo request on another wiki page dedicated to artwork requests. That kind of collaboration is awesome! Thanks to all the artists involved in Debian. Debian packaging Django. This month has seen no less than 3 upstream point releases packaged for Debian (1.7.5, 1.7.6 and 1.7.7) and they have been accepted by the release team into Jessie. I m pleased with this tolerance as I have argued the case for it multiple times in the past given the sane upstream release policy (bugfix only in a given released branch). Python code analysis. I discovered a few months ago a tool combining the power of multiple Python code analysis tools: it s prospector. I just filed a Request for Package for it (see #781165) and someone already volunteered to package it, yay \o/ update-rc.d and systemd. While working on a Kali version based on Jessie, I got hit by what boils down to a poor interaction between systemd and update-rc.d (see #746580) and after some exchanges with other affected users I raised the severity to serious as we really ought to do something about it before release. I also opened #781155 on openbsd-inetd as its usage of inetd.service instead of openbsd-inetd.service (which is only provided as a symlink to the former) leads to multiple small issues. Misc Debian France. The general assembly is over and the new board elected its new president: it s now official, I m no longer Debian France s president. Good luck to Nicolas Dandrimont who took on this responsibility. Salt s openssh formula. I improved salt s openssh formula to make it possible to manage the /etc/ssh/ssh_known_hosts file referencing the public SSH keys of other managed minions. Tendenci.com. I was looking for a free software solution to handle membership management of a large NPO and I discovered Tendenci. It looked very interesting feature wise and written with a language/framework that I enjoy (Python/Django). But while it s free software, there s no community at all. The company that wrote it released it under a free software license and it really looks like that they did intend to build a community but they failed at it. When I looked their development forums were web-based and mostly empty with only initial discussion of the current developers and no reply from anybody there s also no mention of an IRC channel or a mailing list. I sent them a mail to see what kind of collaboration we could expect if we opted for their software and got no reply. A pity, really. What free software membership management solution would you use when you have more than 10000 members to handle and when you want to use the underlying database to offer SSO authentication to multiple external services? Thanks See you next month for a new summary of my activities.

2 comments Liked this article? Click here. My blog is Flattr-enabled.

24 January 2015

Dirk Eddelbuettel: Rcpp 0.11.4

A new release 0.11.4 of Rcpp is now on the CRAN network for GNU R, and an updated Debian package will be uploaded in due course. Rcpp has become the most popular way of enhancing GNU R with C++ code. As of today, 323 packages on CRAN depend on Rcpp for making analyses go faster and further; BioConductor adds another 41 packages, and casual searches on GitHub suggests dozens mores. This release once again adds a large number of small bug fixes, polishes and enhancements. And like the last time, these changes were made by a group of seven different contributors (counting code commits) plus three more providing concrete suggestions. This shows that the Rcpp development and maintenance rests a large number of (broad) shoulders. See below for a detailed list of changes extracted from the NEWS file.
Changes in Rcpp version 0.11.4 (2015-01-20)
  • Changes in Rcpp API:
    • The ListOf<T> class gains the .attr and .names methods common to other Rcpp vectors.
    • The [dpq]nbinom_mu() scalar functions are now available via the R:: namespace when R 3.1.2 or newer is used.
    • Add an additional test for AIX before attempting to include execinfo.h.
    • Rcpp::stop now supports improved printf-like syntax using the small tinyformat header-only library (following a similar implementation in Rcpp11)
    • Pairlist objects are now protected via an additional Shield<> as suggested by Martin Morgan on the rcpp-devel list.
    • Sorting is now prohibited at compile time for objects of type List, RawVector and ExpressionVector.
    • Vectors now have a Vector::const_iterator that is 'const correct' thanks to fix by Romain following a bug report in rcpp-devel by Martyn Plummer.
    • The mean() sugar function now uses a more robust two-pass method, and new unit tests for mean() were added at the same time.
    • The mean() and var() functions now support all core vector types.
    • The setequal() sugar function has been corrected via suggestion by Qiang Kou following a bug report by S ren H jsgaard.
    • The macros major, minor, and makedev no longer leak in from the (Linux) system header sys/sysmacros.h.
    • The push_front() string function was corrected.
  • Changes in Rcpp Attributes:
    • Only look for plugins in the package's namespace (rather than entire search path).
    • Also scan header files for definitions of functions to be considerd by Attributes.
    • Correct the regular expression for source files which are scanned.
  • Changes in Rcpp unit tests
    • Added a new binary test which will load a pre-built package to ensure that the Application Binary Interface (ABI) did not change; this test will (mostly or) only run at Travis where we have reasonable control over the platform running the test and can provide a binary.
    • New unit tests for sugar functions mean, setequal and var were added as noted above.
  • Changes in Rcpp Examples:
    • For the (old) examples ConvolveBenchmarks and OpenMP, the respective Makefile was renamed to GNUmakefile to please R CMD check as well as the CRAN Maintainers.
Thanks to CRANberries, you can also look at a diff to the previous release As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

7 January 2015

Dirk Eddelbuettel: RcppCNPy 0.2.4

A new release of the RcppCNPy package is now on CRAN. This release mostly solidifies and fixes things. Support for saving integer objects, which was expanded in release 0.2.3, was not entirely correct. Operations on big-endian systems were not up to snuff either. Wush Wu helped in getting this right with very diligent testing and patching particularly on big-endian hardware. We also got a pull request from Romain to reflect better const correctness at the Rcpp side of things. Last but not least we obliged by the CRAN Maintainers to not assume one could call gzip from system() call because, well, you guessed it.
Changes in version 0.2.4 (2015-01-05)
  • Support for saving integer objects was not correct and has been fixed.
  • Support for loading and saving on 'big endian' systems was incomplete, has been greatly expanded and corrected, thanks in large part to very diligent testing as well as patching by Wush Wu.
  • The implementation now uses const iterators, thanks to a pull request by Romain Francois.
  • The vignette no longer assumes that one can call gzip via system as the world's leading consumer OS may disagree.
CRANberries also provides a diffstat report for the latest release. As always, feedback is welcome and the rcpp-devel mailing list off the R-Forge page for Rcpp is may be the best place to start a discussion. GitHub issue tickets are also welcome.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

29 November 2014

Dirk Eddelbuettel: RcppArmadillo 0.4.550.1.0

A week ago, Conrad provided another minor release 4.550.0 of Armadillo which has since received one minor correction in 4.550.1.0. As before, I had created a GitHub-only pre-release of his pre-release which was tested against the almost one hundred CRAN dependents of our RcppArmadillo package. This passed fine as usual, and results are as always in the rcpp-logs repository. Processing and acceptance at the CRAN took a little longer as around the same time a fresh failure in unit tests had become apparent on an as-of-yet unannounced new architecture (!!) also tested at CRAN. The R-devel release has since gotten a new capabilities() test for long double, and we now only run this test (for our rmultinom()) if the test asserts that the given R build has this capability. Phew, so with all that the new version in now on CRAN; Windows binaries have been built and I also uploaded new Debian binaries. Changes are summarized below; our end also includes added support for conversion of Field types takes to short pull request by Romain.
Changes in RcppArmadillo version 0.4.550.1.0 (2014-11-26)
  • Upgraded to Armadillo release Version 4.550.1 ("Singapore Sling Deluxe")
    • added matrix exponential function: expmat()
    • faster .log_p() and .avg_log_p() functions in the gmm_diag class when compiling with OpenMP enabled
    • faster handling of in-place addition/subtraction of expressions with an outer product
    • applied correction to gmm_diag relative to the 4.550 release
  • The Armadillo Field type is now converted in as<> conversions
Courtesy of CRANberries, there is also a diffstat report for the most recent release. As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

30 September 2013

Dirk Eddelbuettel: RcppArmadillo 0.3.920.1

Along with the Rcpp 0.10.5 release yesterday, a new minor release 0.3.920.1 of RcppArmadillo came out. It is based on Conrad's Armadillo 3.920.0 plus a minor fix, and uses some of the new Rcpp features. Both package is now on CRAN and also in Debian. This releases contains both a nice set of new Armadillo features as well as some nice additions to RcppArmadillo, due again mostly to Romain. Some of the changes tie into the changes changes in Rcpp 0.10.5 as for example the ability to pass const and const ref more efficiently (and we seem to have forgotten an entry in the NEWS file). The complete list of changes is below.
Changes in RcppArmadillo version 0.3.920.1 (2013-09-27)
  • Upgraded to Armadillo release Version 3.920.1 (Agencia Nacional Stasi)
    • faster .zeros()
    • faster round(), exp2() and log2() when using C++11
    • added signum function: sign()
    • added move constructors when using C++11
    • added 2D fast Fourier transform: fft2()
    • added .tube() for easier extraction of vectors and subcubes from cubes
    • added specification of a fill type during construction of Mat, Col, Row and Cube classes, eg. mat X(4, 5, fill::zeros)
  • Initial implementation of wrap<subview>
  • Improved implementation of as<>() and wrap() for sparse matrices
  • Converted main vignette from LaTeX style minted to lstlisting which permits builds on CRAN; removed set BuildVignettes: FALSE.
Courtesy of CRANberries, there is also a diffstat report for the most recent release As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

21 March 2013

Paul Tagliamonte: Hy: The joke just got pretty serious

(btw, the source is on github) During the sprints at PyCon in a spare moment after some awesome OpenGov hacking, I ended up doing the unthinkable: rpython support in Hy. Yes. That s right. Lisp > Python > C > x86 instructions. The thanks here goes to Romain Guillebert - who is a really funny rpythonista and took the time to sit down and help me with this frankly insane idea. It works, though. I filed a bug on shipping rpython bits from PyPy s Debian package (hi, tumbleweed!), which should make building this a skitch easier. Finally, and unrelatedly, I also just got a .hy > .pyc compiler working (huzzah!) which means no one will ever know we wrote anything in Lisp, ever. As always, play with the REPL, star the code or have a laugh with it. Let me know what you hack up! I ll post the lightning talk I gave after it s posted :)

28 February 2013

Dirk Eddelbuettel: inline 0.3.11

A maintenance release of inline is now on CRAN, and is being uploaded to Debian. The release fixes two minor bugs kindly reported by users. As the two previous releases appear to not have been announced here, their NEWS entries are included as well.
Changes in inline version 0.3.11 (2013-02-26)
  • Fix bug in cfunction for .C convention with raw vectors.
  • Correct cfunction to use .Platform$dynlib.ext as the file extension for the library file (unless on Windows).
  • Allow rcpp wrapper to pass another plugin (as eg RcppArmadillo)
Changes in inline version 0.3.10 (2012-10-03)
  • getDynLib() error message corrected as suggested by Yasir Suhail
  • Added rcpp() wrapper for cxxfunction() which sets plugin="Rcpp"
  • Converted NEWS to NEWS.Rd
  • New maintainer, after having coordinated releases (along with Romain) since 0.3.5 in June 2010
Changes in inline version 0.3.9 (2012-10-02)
  • Uncoordinating hijacking of package by CRAN maintainers with a single word change in cfunction.R to prevent an error under an unreleased version of R
Courtesy of CRANberries, there is also a diffstat report for the most recent release. A few more details are available at the R-Forge page.

5 February 2013

Dirk Eddelbuettel: New Rcpp page on upcoming events -- including Master Class in New York

Lots of exciting things are happening with and around Rcpp. I just added a new page about Upcoming Events to the recently-created Rcpp site. This events page has lots to cover: an upcoming talk at Columbia on March 8 (details still TBD), a day-long workshop in New York on March 9, a possible participation at a CERN / ROOT conference in Switzerland on May 11-14, an upcoming talk in May in Milwaukee, and last but not least the tutorial by Romain and Hadley at UseR! 2013 in Spain. Phew! With that, a few quick words about the upcoming master class in New York. It will be a full day, covering an introduction and motivation, details about the core data types, tools for working with and and extending Rcpp and of course applications galore, including RcppArmadillo and RInside. I have done the same one day class format a few times before, most recently (with Revolution Analytics) in San Francisco in late 2011, and also as a two-part seminar at UseR! 2012. This time, we plan on providing cloud-hosted RStudio instances for participants. Better still, RStudio's own JJ Allaire will be on deck as well for RStudio --- and Rcpp Attributes --- questions. Details and registration information for the New York class are at this page.

3 February 2013

Dirk Eddelbuettel: The Rcpp Gallery and my Seinfeld Streak

A good three weeks ago, we introduced the Rcpp Gallery. While this is a joint effort by several of us on the Rcpp team, the backend was conceived and implemented entirely by JJ who also bootstrapped it with same first content, drawing on posts by Hadley, Romain and myself. As the How to contribute page makes plain, this is all backed by GitHub and all logs are public anyway. So after it was up and working, JJ and I refined the look and feel, and I started to add more content so that would have something by the time the initial announcement came around. A few years I read about an (attributed) secret to Seinfeld's producitivity: "Don't break the chain". Just keep writing, and write every day. I made my goal of a post every day for just over a month, and created this sequences: <quote> (20 Dec) simulating-pi, (21 Dec) vector-minimum, (22 Dec) gsl-colnorm-example, (23 Dec) fibonacci-sequence, (24 Dec) random-number-generation, (25 Dec) armadillo-sparse-matrix, (26 Dec) timing-rngs, (27 Dec) stl-inner-product, (28 Dec) stl-transform, (29 Dec) stl-transform-for-subsetting, (30 Dec) stl-random-shuffle, (31 Dec) stl-random-sample, (01 Jan) stl-for-each, (02 Jan) armadillo-subsetting, (03 Jan) accessing-environments, (04 Jan) armadillo-eigenvalues, (05 Jan) r-function-from-c++, (06 Jan) using-the-rcpp-timer, (07 Jan) sugar-function-clamp, (08 Jan) using-rcout, (09 Jan) first-steps-with-C++11, (10 Jan) simple-lambda-func-c++11, (11 Jan) eigen-eigenvalues, (12 Jan) getting-attributes-for-xts-example, (13 Jan) intro-to-exceptions, (14 Jan) a-first-boost-example, (15 Jan) a-second-boost-example, (16 Jan) timing-normal-rngs, (17 Jan) creating-xts-from-c++, (18 Jan) gsl-for-eigenvalues, (19 Jan) accessing-xts-api, (20 Jan) custom-as-and-wrap-example, (21 Jan) passing-cpp-function-pointers, </quote> The Rcpp Gallery continues to grow, we now have 58 posts from 7 different authors. And it is open for business: new contributions are always welcome.

1 February 2013

Dirk Eddelbuettel: Introducing the BH package

Earlier today a new package BH arrived on CRAN. Over the years, Jay Emerson, Michael Kane and I had numerous discussions about a basic Boost infrastructure package providing Boost headers for other CRAN packages (and yes, we are talking packages using C++ here). JJ and Romain chipped in as well, and Jay finally took the lead by first creating a repo on R-Forge. And now the package is out, so I just put together a quick demo post over at the Rcpp Gallery. As that post notes, BH is still pretty new and rough, and we probably missed some other useful Boost packages. If so, let one of us know.

26 October 2012

Dirk Eddelbuettel: Accelerating R code: Computing Implied Volatilities Orders of Magnitude Faster

This blog, together with Romain's, is one of the main homes of stories about how Rcpp can help with getting code to run faster in the context of the R system for statistical programming and analysis. By making it easier to get already existing C or C++ code to R, or equally to extend R with new C++ code, Rcpp can help in getting stuff done. And it is often fairly straightforward to do so. In this context, I have a nice new example. And for once, it is work-related. I generally cannot share too much of what we do there as this is, well, proprietary, but I have this nice new example. The other day, I was constructing (large) time series of implied volatilities. Implied volatilities can be thought of as the complement to an option's price: given a price (and all other observables which can be thought of as fixed), we compute an implied volatility price (typically via the standard Black-Scholes model). Given a changed implied volatility, we infer a new price -- see this Wikipedia page for more details. In essence, it opens the door to all sorts of arbitrage and relative value pricing adventures. Now, we observe prices fairly frequently to create somewhat sizeable time series of option prices. And each price corresponds to one matching implied volatility, and for each such price we have to solve a small and straightforward optimization problem: to compute the implied volatility given the price. This is usually done with an iterative root finder. The problem comes from the fact that we have to do this (i) over and over and over for large data sets, and (ii) that there are a number of callbacks from the (generic) solver to the (standard) option pricer. So our first approach was to just call the corresponding function GBSVolatility from the fOption package from the trusted Rmetrics project by Diethelm Wuertz et al. This worked fine, but even with the usual tricks of splitting over multiple cores/machines, it simply took too long for the resolution and data amount we desired. One of the problems is that this function (which uses the proper uniroot optimizer in R) is not inefficient per se, but simply makes to many function call back to the option pricer as can be seen from a quick glance at the code. The helper function .fGBSVolatility gets called time and time again:
R> GBSVolatility
function (price, TypeFlag = c("c", "p"), S, X, Time, r, b, tol = .Machine$double.eps, 
    maxiter = 10000) 
 
    TypeFlag = TypeFlag[1]
    volatility = uniroot(.fGBSVolatility, interval = c(-10, 10), 
        price = price, TypeFlag = TypeFlag, S = S, X = X, Time = Time, 
        r = r, b = b, tol = tol, maxiter = maxiter)$root
    volatility
 
<environment: namespace:fOptions>
R> 
R> .fGBSVolatility
function (x, price, TypeFlag, S, X, Time, r, b, ...) 
 
    GBS = GBSOption(TypeFlag = TypeFlag, S = S, X = X, Time = Time, 
        r = r, b = b, sigma = x)@price
    price - GBS
 
<environment: namespace:fOptions>
So the next idea was to try the corresponding function from my RQuantLib package which brings (parts of) QuantLib to R. That was seen as been lots faster already. Now, QuantLib is pretty big and so is RQuantLib, and we felt it may not make sense to install it on a number of machines just for this simple problem. So one evening this week I noodled around for an hour or two and combined (i) a basic Black/Scholes calculation and (ii) a standard univariate zero finder (both of which can be found or described in numerous places) to minimize the difference between the observed price and the price given an implied volatility. With about one hundred lines in C++, I had something which felt fast enough. So today I hooked this into R via a two-line wrapper in quickly-created package using Rcpp. I had one more advantage here. For our time series problem, the majority of the parameters (strike, time to maturity, rate, ...) are fixed, so we can structure the problem to be vectorised right from the start. I cannot share the code or more the details of my new implementation. However, both GBSVolatility and EuropeanOprionImpliedVolatility are on CRAN (and as I happen to maintain these for Debian, also just one sudo apt-get install r-cran-foptions r-cran-rquantlib away if you're on Debian or Ubuntu). And writing the other solver is really not that involved. Anyway, here is the result, courtesy of a quick run via the rbenchmark package. We create a vector of length 500; the implied volatility computation will be performed at each point (and yes, our time series are much longer indeed). This is replicated 100 times (as is the default for rbenchmark) for each of the three approaches:
xyz@xxxxxxxx:~$ r xxxxR/packages/xxxxOptions/demo/timing.R
    test replications elapsed  relative user.self sys.self user.child sys.child
3 zzz(X)          100   0.038     1.000     0.040    0.000          0         0
2 RQL(X)          100   3.657    96.237     3.596    0.060          0         0
1 fOp(X)          100 448.060 11791.053   446.644    1.436          0         0
xyz@xxxxxxxx:~$ 
The new local solution is denoted by zzz(X). It is already orders of magnitude faster than the RQL(x) function using RQuantLib (which is, I presume, due to my custom solution internalising the loop). And the new approach is a laughable amount faster than the basic approach (shown as fOp) via fOptions. For one hundred replications of solving implied volatilities for all elements of a vector of size 500, the slow solution takes about 7.5 minutes --- while the fast solution takes 38 milliseconds. Which comes to a relative gain of over 11,000. So sitting down with your C++ compiler to craft a quick one-hundred lines, combining two well-known and tested methods, can reap sizeable benefits. And Rcpp makes it trivial to call this from R.

14 July 2012

Romain Francoise: Getting clickable URLs in xterm (sort of)

My terminal emulator of choice is xterm: it's fast, light, and (since it's pretty much the reference implementation) it has excellent support for everything a console user could desire... except for one thing that would be very, very convenient: making URLs clickable. Other terminal emulators have this feature, but they also have problems that make them inferior to xterm in different ways. Over the years people have come up with various workarounds for this situation, like screen scraping utilities (urlview, urlscan) that can be hooked up to other console programs to extract and browse URLs, but they're still not as convenient as just using the mouse, and often require the program to run on the same machine as the browser. Fortunately, xterm #277 (released in January 2012) added a new feature that provides almost exactly what I had been looking for: it can now spawn programs using the exec-formatted action and give them as argument the contents of the current selection or clipboard. So you can add the following to your ~/.Xresources: *VT100*translations: #override Meta <Btn1Up>: exec-formatted("x-www-browser '%t'", PRIMARY) which makes xterm run x-www-browser on the selection when it receives Alt + left click. (Adjust for whatever your Meta key is.) This is advantageously combined with a charClass setting to make xterm treat URLs as a single word, so that you can just double-click on them to select them: XTerm*charClass: 33:48,36-47:48,58-59:48,61:48,63-64:48,95:48,126:48 With both of these enabled, opening URLs is now just a matter of:
  1. Double-clicking the URL to select it
  2. Doing Alt + click anywhere on the xterm window to run the browser
which, while more involved than a single click, it still much faster than having to copy the URL manually to the browser.

24 March 2012

Dirk Eddelbuettel: Initial release 0.1.0 of package RcppSMC

Hm, I realized that I announced this on Google+ (via Rcpp) as well as on Twitter, on the r-packages list, wrote a new and simple web page for it, but had not put it on my blog. So here is some catching up. Sequential Monte Carlo / Particle Filter is a (to quote the Wikipedia page I just linked to) sophisticated model estimation technique based on simulation. They are related to both Kalman Filters, and Markov Chain Monte Carlo methods. Adam Johansen has a rather nice set of C++ classes documentated in his 2009 paper in the Journal of Statistical Software (JSS). I started to play with these classes and realized that, once again, this would make perfect sense in an R extension built with the Rcpp package by Romain and myself (and in JSS too). So I put a first prototype onto R-Forge and emailed Adam who, to my pleasant surprise, was quite interested. And a couple of emails, and commits later, we are happy to present a very first release 0.1.0. I wrote a few words on a RcppSMC page on my website where you can find a few more details. But in short, we already have example functions demonstrating the backend classes by reproducing examples from
Johansen (2009)
and his example 5.1 via pfLineartBS() for a linear bootstrap example;
Doucet, Briers and Senecal (2006)
and their (optimal) block-sampling particle filter for a linear Gaussian model (serving as an illustration as the setup does of course have an analytical solution) via the function blockpfGaussianOpt()
Gordon, Salmond and Smith (1993)
and their ubiqitous nonlinear state space model via the function pfNonlinBS().
And to illustrate just why Rcpp is so cool for this, here is a little animation of a callback from the C++ code when doing the filtering on Adam's example 5.1. By passing a simple plotting function, written in R, to the C++ code, we can get a plot updated on every iteration. Here I cheated a little and used our old plot function with fixed ranges, the package now uses a more general function: Example of RcppSMC callback to R plot when estimation example 5.1 from Johansen (2009) The animation is of course due to ImageMagick glueing one hundred files into a single animated gif. More information about RcppSMC is on its page, and we intend to add more examples and extensions over time.

22 December 2011

Dirk Eddelbuettel: Rcpp 0.9.8

A new release 0.9.8 of Rcpp is now on CRAN and will also get into Debian shortly (once I finish building R 2.14.1). This release contains a few incremental changes. Romain, sponsored by by the Open Source Programs Office at Google, had released a new package int64 bringing larger integers to R, and this is now supported by Rcpp as well. John Chambers contributed some code to have Reference Classes extend existing C++ classes (typically brought in via Rcpp Modules). Jelmer Ypma sent us a patch to add a Rcout device not unlike cout, but aligned with R's io buffering. We added some more unit tests, and made a few small fixes here or there. The complete NEWS entry is below; more details are in the ChangeLog file in the package and on the Rcpp Changelog page.
0.9.8   2011-12-21
    o   wrap now handles 64 bit integers (int64_t, uint64_t) and containers 
        of them, and Rcpp now depends on the int64 package (also on CRAN).
        This work has been sponsored by the Google Open Source Programs
        Office.
    o   Added setRcppClass() function to create extended reference classes 
        with an interface to a C++ class (typically via Rcpp Module) which
        can have R-based fields and methods in addition to those from the C++.
    o   Applied patch by Jelmer Ypma which adds an output stream class
        'Rcout' not unlike std::cout, but implemented via Rprintf to
        cooperate with R and its output buffering.
        
    o   New unit tests for pf(), pnf(), pchisq(), pnchisq() and pcauchy()
    o   XPtr constructor now checks for corresponding type in SEXP
    o   Updated vignettes for use with updated highlight package
    o   Update linking command for older fastLm() example using external 
        Armadillo
Thanks to CRANberries, you can also look at a diff to the previous release 0.9.7. As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page

20 November 2011

Dirk Eddelbuettel: RcppArmadillo 0.2.30 (and 0.2.29)

A few days ago, Conrad Sanderson released the first pre-release version of what will be Armadillo 2.4.*, giving it the 2.3.91 release handle. We folded this into RcppArmadillo release 0.2.30, with Romain making a few adjustments to our template structure to accomodate Conrad's underlying changes in Armadillo itself. Armadillo is a wonderfully expressive (thanks to clever modern template programming), powerful yet simple-to-use C++ library for linear algebra, making expressions in C++ as easy as writing in Matlab or R. By deploying our seamless Rcpp glue between R and C++, RcppArmadillo brings this nice C++ library to R users. The CRAN page for RcppArmadillo now lists ten packages using the RcppArmadillo package. There was also an earlier bug-fix release 0.2.29 which I had not blogged about separately. The NEWS entries summarising the changes for both are below:
0.2.30  2011-11-19
    o   Upgraded to Armadillo test release 2.3.91 "Loco Lounge Lizard (Beta 1)"
          * added shorter forms of transposes: .t() and .st()
          * added optional use of 64 bit indices, allowing matrices to have
            more than 4 billion elements
          * added experimental support for C++11 initialiser lists
          * faster pinv()
          * faster inplace transpose
          * bugfixes for handling expressions with aliasing and submatrices
          * refactored code to eliminate warnings when using the Clang C++
            compiler
          * .print_trans() and .raw_print_trans() are deprecated 
0.2.29  2011-09-01
    o   Upgraded to Armadillo release 2.2.3
          * Release fixes a speed issue in the as_scalar() function.
Courtesy of CRANberries, there are also diffstat reports for 0.2.30 relative to 0.2.29 and for 0.2.29 relative to 0.2.28. As always, more detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

6 November 2011

Dirk Eddelbuettel: Rcpp talk at Seattle RUG next month

The Seattle R User Group was kind enough to invite me to give a talk about R, C++ and Rcpp. So if you can make it to the Thomas building of the Fred Hutchinson Cancer Research Center in Seattle, WA, on December 7, I would love to see you there. I have some ideas about freshening up the presentation(s) based on material Romain and I have used in the past. This should make the why as well as how a little clearer; now I just have to find some to put this together. And if there are particular aspects you would like to see covered, please do get in touch with me.

11 October 2011

Christian Perrier: RWC 2011 : after 1/4 finals

I was expecting to write more about the 7th Rugby World Cup, but real life, Debian work and running activities prevented me to do so.. Still, I watched several games and I can share my feelings now. First of all, this is a tremendous organization from New Zealand. It seems that about the entire country is working on making this even a great success and I really appreciate to see a place I certainly have to visit some day receiving such wordlwide attention (OK, admitedly, mostly in the part of the world that understands rugby). First round already gave several surprises even if the 1/4 finals were after all kinda expected (at least, the list of countries). Which lead us to the following 1/4 finals (in parenthesis are my bets): If you know about the results, you know I screwed it nearly completely..:-) The Welsh team played a great game against an uninspired Irish team. It has certainly been the best 1/4 final and they certainly deserve their win. Even though I usually tend to be supprotive of Ireland, I was very balanced here, and finally turned out to be in favor of the Welsh. We really have to fear them in semi-finals. Australia-South Africa was theoretically the most exciting 1/4 final but finally turned out to be quite boring. Both teams insisted on playing mostly to occupy their opponents part of the field, more than trying to score, then relying on penalties to score. That was apparently the good tactics for Australia ad they also deserve their win against a South-African team where forward players were not as decisive as they sometimes are. Argentina was again there and really there. It has been the only team up to now who lead score against New Zealand. And that was deserved. What a wonderful 1st halftime! Obviously, it was impossible for them to resist during second halftime and it slowly became obvious that the Blacks (saved during 1st half by the kicks of a very inspired Kiri Weepu) would finally manage to score tries. But, still, our argentinian friends, for instance the inoxydable "Super Mario" Ledesma, or the tireless Felipe Contepomi, were not here as a sacrificial victim. What to say about England-France? First half was astonishing for us, of course. This is what we love (and hate) with our beloved French team. Definitely the team that can make surprises and, imho, the only one that can beat New Zealand if that has to happen (but also the only one that can be entirely crushed by them). The defeat against Tonga and the week that followed completely transformed them. A stunning 3rd row, defending each and every single bit of England trying to invade "la patrie en danger". Rear lanes with the magicians of Toulouse (Clerc and M dard) as the ideal finishers of magic play by Parra, Trinh Duc, Mermoz, Palisson (the good surprise of this world cup, Alexis). And, during second half, a trilling resistance to assaults of the British White Knights, concluded by this delivering drop-goal by Trinh Duc. For sure, with games like this, they can beat everybody and by everybody, I mean everybody. Remember Millenium Stadium in 2007..:-) So, well. Australia-New Zealand and France-Wales. I know where my heart is balancing for both games. The Blacks and Les Bleus in final, thi is what we hope (and fear...), but both teams, particularly France, will have to first climb a quite big wall before reaching this.

1 October 2011

Dirk Eddelbuettel: Reminder: One week til Rcpp class in San Francisco

Just a quick note to remind everyone that the Rcpp class in San Francisco, which I am holding together with Revolution Analytics, will take place a week from today. We are happy to report that the number of registrations has met our initial targets. But as a number of open slots remain, we have decided to offer a few places at discounts of 25% for academics (with code acad1) and 50% for students (with code student). Course details are at the Revolution course page, registration is at the Eventbrite page. And just for completeness, here is what I wrote in the previous announcement:
The format will follow the workshop Romain and I gave during the tutorial day preceding this year's R/Finance conference. The style will once again be hands-on, with copious concrete examples and solid coverage of most aspects of Rcpp and related packages such as RInside, RcppArmadillo and others. The eight-hour schedule contains about six hours of instruction, split into four sessions of around ninety minutes. This leaves ample time for both lunch and coffee breaks, and for informal discussions and Q+A. The one-day class will be offered in San Franciso on Saturday, October 8, 2011. Please see the official course page for more details, concrete location info and maps as well as registration details.
Feel free to contact me at the usual email address with questions. Or with suggestions for the after-party in San Francisco :)

5 August 2011

Dirk Eddelbuettel: New Rcpp master classes scheduled for New York and San Francisco

Together with Revolution Analytics, I will be offering two more one-day classes on the Rcpp package for seamless integration of R and C++. The format will follow the workshop Romain and I gave during the tutorial day preceding this year's R/Finance conference. The style will once again be hands-on, with copious concrete examples and solid coverage of most aspects of Rcpp and related packages such as RInside, RcppArmadillo and others. The eight-hour schedule contains about six hours of instruction, split into four sessions of around ninety minutes. This leaves ample time for both lunch and coffee breaks, and for informal discussions and Q+A. Two one-day classes will be offered: The first in New York on Saturday, September 28, 2011 and the second one two weeks later in San Franciso on Saturday, October 8, 2011. Please see the official course page for more details, concrete location info and maps as well as registration details. Feel free to contact me at the usual email address with questions.

6 July 2011

Dirk Eddelbuettel: Even faster linear model fits with R using RcppEigen

Linear regression models are a major component of every applied researcher's toolbox. Obtaining results more quickly is therefore of central importance, particularly when many such models have to be fit. Common examples in this context are Monte Carlo simulation or bootstrapping. My talks introducing High Performance Computing with R (see e.g. these slides from a five-hour workshop at the ISM in Tokyo) frequently feature an example of how to extend R with dedicated compiled code for linear regressions. Romain and I also frequently use this a motivating examples with our Rcpp package for seamless R and C++ integration. In fact, the examples directory for Rcpp still contains an earlier version of a benchmark for fastLm(), a faster alternative for R's lm() and lm.fit() functions. We have also extended this with the RcppArmadillo package which brings Conrad Sanderson's excellent Armadillo library with templated C++ code for linear algebra to R, as well as a simple integration to the GNU GSL via our RcppGSL package. The Rcpp section on my blog contains several posts about fastLm benchmarks. Doug Bates has been a key Rcpp contributor, helping particularly with the initial Armadillo integration. His research, however, also requires highly performing sparse matrix operations which Armadillo does not yet offer. So Doug has started to explore the Eigen project---a free C++ template math library mainly focused on vectors, matrices, and linear algebra (note that we will refer to the Eigen, Eigen2 and Eigen3 APIs as just 'Eigen' here, focusing on the latest version, Eigen3). Better still, Doug went to work and pretty much single-handedly wrote a new package RcppEigen which integrates the templated C++ library Eigen with R using Rcpp. RcppEigen also provides a fastLm implementation and benchmark script. In fact, it contains a full six different implementations as Doug is keenly interested in rank-revealing decompositions which can guard against ill-conditioned model matrices. Some more background information on this is also available in Doug's article on Least Squares Calculations in R in R News 4(1). Doug's implementation also uses an elegant design. It comprises a base class with common functionality, and six subclasses which specialize accordingly for these six different decompositions approaches: On my server, the result of running the included benchmark script lmBenchmark is as follows:
lm benchmark for n = 100000 and p = 40: nrep = 20
     test   relative elapsed user.self sys.self
7     LLt   1.000000   0.918      0.91     0.00
3    LDLt   1.002179   0.920      0.92     0.00
5 SymmEig   3.021786   2.774      2.19     0.57
6      QR   5.136166   4.715      4.24     0.48
2   PivQR   5.303922   4.869      4.27     0.58
8    arma   6.592593   6.052      6.03     0.02
1  lm.fit   9.386710   8.617      7.14     1.45
4     SVD  33.858388  31.082     30.19     0.84
9     GSL 114.972767 105.545    104.79     0.63
From this first set of results, the preferred method may be 'PivQR', the pivoted QR. Strictly-speaking, it is the only one we can compare to lm.fit() which also uses a pivoting scheme. In the case of a degenerated model matrix, all the other methods, including the four fastest approaches, are susceptible to producing incorrect estimates. Doug plans to make SVD and SymmEig rank-revealing too. As for pure speed, the LL and LDL decomposition have almost identical performance, and are clearly faster than the other approaches. Compared to lm.fit(), which is the best one could do with just R, we see an improvement by a factor of eight which is quite impressive (albeit not robust to rank-deficient model matrices). Apart from the SVD, all approaches using Eigen are faster than the one using Armadillo, which itself is still faster than R's lm.fit(). Doug and I were very surprised by the poor performance of the GNU GSL (which also uses SVD) via RcppGSL. Now, Eigen uses its own code for all linear algebra operations, bypassing BLAS and LAPACK libraries. The results above were achieved with the current Atlas package in Ubuntu. If we take advantage of the BLAS / LAPACK plug-in architecture offered on Debian / Ubuntu systems (see the vignette in my gcbd package for more) and use Goto BLAS which provide tuning as well as parallelism on multi-core machines, the results are as follow:
lm benchmark for n = 100000 and p = 40: nrep = 20
     test   relative elapsed user.self sys.self
3    LDLt   1.000000   0.907      0.90     0.00
7     LLt   1.000000   0.907      0.91     0.00
5 SymmEig   2.981257   2.704      2.14     0.56
6      QR   5.004410   4.539      4.03     0.50
8    arma   5.072767   4.601     15.30     3.05
2   PivQR   5.307607   4.814      4.27     0.55
1  lm.fit   8.302095   7.530      9.55    12.25
4     SVD  33.015436  29.945     29.06     0.85
9     GSL 195.413451 177.240    244.64   319.89
We see that the BLAS-using Armadillo approach improves a little and moves just slightly ahead of the pivoted QR. On the other hand, lm.fit(), which also uses a pivoting scheme and hence only level 1 BLAS operations, changes less. GSL performs even worse (and it is unclear why). Doug's post announcing RcppEigen on the Eigen list has a few more sets of results. This post has illustrated some of the performance gains that can be obtained from using Eigen via RcppEigen. When not using rank-revealing methods, computing time can be reduced by up to eight times relative to lm.fit(). Rank-revealing method can still improve by almost a factor of two. The main disadvantage of Eigen may be one of the reasons behind its impressive performance: its heavily templated code does not use BLAS, and the resulting object code (as e.g. in RcppEigen) becomes enormous (when compiling with debugging symbols). As one illustration, the shared library for RcppEigen on my Ubuntu 64-bit system has a size of 24.6 mb whereas RcppArmadillo comes in at a mere 0.78 mb; without debugging symbols it is a more reasonable 0.52 mb. The performance of Eigen is certainly intriguiging, and its API is rather complete. It seems safe to say that we may see more R projects going to make use of Eigen thanks to the RcppEigen wrapper.

Update: Clarified statement about large object size which was entirely due to building with debugging support.

Next.

Previous.